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.ScalaWe.high-sp.eed.D 

Marcel Waldvogel, George Varghese, Jon Turner, Bernhard Plattner 

November 2001 ACM Transactions on Computer Systems (TOCS), volume 19 issue 4 

Full text available: ^1] pdf(933.02 KB) Additional Information: full citation, abstract, references, citings, Index te 

Finding the longest matching prefix from a database of keywords is an old problem with a numbei 
ranging from dictionary searches to advanced memory management to computational geometry, 
today's most frequent best matching prefix lookups occur in the Internet, when forwarding packei 
router. Internet traffic volume and link speeds are rapidly increasing; at the same time, a growing 
is increasing the size of routing tables against which p ... 

Keywords: collision resolution, forwarding lookups, high-speed networking 

Data and memory optimization techniques for embedded systems 

P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, P 
April 2001 ACM Transactions on Design Automation of Electronic Systems (TODAES), vokm 

Full text available: ||^f(33Mi KB) Additional Information: M citation, abstract, references, citings, in.de5c.te 

We present a survey of the state-of-the-art techniques used in performing data and memory-rela 
in embedded systems. The optimizations are targeted directly or indirectly at the memory subsys 
one or more out of three important cost metrics: area, performance, and power dissipation of the 
implementation. We first examine architecture-independent optimizations in the form of code trar 
We next cover a broad spectrum of optimizati ... 

Keywords: DRAM, SRAM, address generation, allocation, architecture exploration, code transforr 
cache, data optimization, high-level synthesis, memory architecture customization, memory powe 
register file, size estimation, survey 



Predictor-directed stream buffers 

Timothy Sherwood, Suleyman Sair, Brad Calder 

December 2000 Proceedings of the 33rd annual ACM/IEEE international symposium on Micro 

Full text available: f| pdft 187:39 KB) Bj ps(1.12 

ifU Additional Information: full citation, references, citings, index terms 
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4 Jntr.apro3ram.dyna 

Fen Xie, Margaret Martonosi, Sharad Malik 

September 2004 ACM Transactions on Architecture and Code Optimization (TACO), volume 1 issi 
Full text available: i gj] pdfif980.11 KB) Additional Information: full citation, abstract, references, index terms 

Dynamic voltage scaling (DVS) has become an important dynamic power-management technique 
DVS tunes the power-performance tradeoff to the needs of the application. The goal is to minimiz 
consumption while meeting performance needs. Since CPU power consumption is strongly depenc 
supply voltage, DVS exploits the ability to control the power consumption by varying a processor': 
and clock frequency. However, because of the energy and time overhead asso ... 

Keywords: Analytical model, compiler, dynamic voltage scaling, low power, mixed-integer linear 



5 Reducing memory latency via non-blocking and prefetching caches 

Tien-Fu Chen, Jean-Loup Baer 

September 1992 ACM SIGPLAN Notices , Proceedings of the fifth international conference on 

support for programming languages and operating systems, volume 27 issue 9 

Full text available: i gjaCL2&JMW Additional Information: M citation, references, citings, indexterms 



6 Concurred 
performance? 

Vinodh Cuppu, Bruce Jacob 

May 2001 ACM SIGARCH Computer Architecture News , Proceedings of the 28th annual ir 

symposium on Computer architecture, volume 29 issue 2 
Full text available: "gl) pdf(904.17 KB) Additional Information: full citation, abstract, references, citings, index te 

Given a fixed CPU architecture and a fixed DRAM timing specification, there is still a large design t 
DRAM system organization. Parameters include the number of memory channels, the bandwidth c 
burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protoct 
assigning request priorities and scheduling requests dynamically, etc. In this design space, we se< 
variation in application execution times: for example, ... 

7 EaM.detect^ 

Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies on C 

research 

Full text available: | |pdft4.21 MB) Additional Information: full citation , abstract references, index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based on proo 
diagrams are often used to obtain a better understanding of the execution of the application. The 
we use is Poet, an event tracer developed at the University of Waterloo. However, these diagrams 
complex and do not provide the user with the desired overview of the application. In our experien 
display repeated occurrences of non-trivial commun ... 

8 .Sysjem;^ 

Luca Benini, Giovanni de Micheii 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), voiun 
Full text available: || pdf{385.22 KB) Additional Information: full citation, abstract, references, citings , Index te 

This tutorial surveys design methods for energy-efficient system-level design. We consider electrc 
consisting of a hardware platform and software layers. We consider the three major constituents 1 
consume energy, namely computation, communication, and storage units, and we review method 
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their energy consumption. We also study models for analyzing the energy cost of software, and nr 
energy-efficient software design and compilation. This survery ... 

9 Design and Implementation of High-Performance Memory Systems for Future Packet Buffer 

Jorge Garcia, Jesus Corbal, Lloreng Cerda, Mateo Valero 

December 2003 Proceedings of the 36th annual IEEE/ACM International Symposium on Mien 

Full text available: ^p.dj04SJK..KB) Additional Information: MMalion, abstract, jM^xJerms 

In this paper we address the design of a future high-speedrouter that supports line rates as high . 
Gb/s), around one hundred ports and several service classes. Buildingsuch a high-speed router wc 
technological problems,one of them being the packet buffer design, mainly becausein router desk 
to provide worst-case bandwidthguarantees and not just average-case optimizations. A previous p 
design provides worst-case bandwidthguarantees by using ... 

10 High-speed policy-based packet forwarding using efficient multi-dimensional range matching 

T. V. Lakshman, D. Stiliadis 

October 1998 ACM SIGCOMM Computer Communication Review , Proceedings of the ACM SIC 

conference on Applications, technologies, architectures, and protocols for com 

Communication, Volume 28 Issue 4 

Full text available: i ga^ia2LMBJ Additional Information: M^it&jon, abstract, references, citings, iadexje 

The ability to provide differentiated services to users with widely varying requirements is becomir 
important, and Internet Service Providers would like to provide these differentiated services using 
shared network infrastructure. The key mechanism, that enables differentiation in a connectionles 
packet classification function that parses the headers of the packets, and after determining their < 
them based on administrative policies or re ... 

11 Smart Memories: a modular reconfigurable architecture 

Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, Mark Horowitz 

May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th annual ir 

symposium on Computer architecture, volume 28 issue 2 
Full text available: ^pdty80.16 KB) Additional Information: full citation, abstract, references, citings, index te 

Trends in VLSI technology scaling demand that future computing devices be narrowly focused to < 
performance and high efficiency, yet also target the high volumes and low costs of widely applical 
purpose designs. To address these conflicting requirements, we propose a modular reconfigurable 
called Smart Memories, targeted at computing needs in the O.l&mgr; technology generation. A S 
chip is made up of many processing tiles, each containing local ... 

12 CRUSADE: hardware/software co-synthesis of dynamically reconfSgurable heterogeneous re 
distributed embedded systems 

Bharat P. Dave 

January 1999 Proceedings of the conference on Design, automation and test in Europe 

Full text available: ^| pdf(59 35 KB) Additional Information: M.cMton, citings, Index terms 



13 A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit da 

Zhao Zhang, Zhichun Zhu, Xiaodong Zhang 

December 2000 Proceedings of the 33rd annual ACM/IEEE international symposium on Micro 

Full text available: ^pdff 153.06 KB) E ft ps 

f^"* Additional Information: full citation, references, citings., index terms 
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External.memgi^;.al^^^ 

Jeffrey Scott Vitter 

June 2001 ACM Computing Surveys (CSUR), Volume 33 issue 2 

Full text available: ^ pdtif828.46 KB) Additional Information: full citation, abstract. references, citings, index ie 

Data sets in large applications are often too massive to fit completely inside the computers intern 
resulting input/output communication (or I/O) between fast internal memory and slower external 
as disks) can be a major performance bottleneck. In this article we survey the state of the art in t 
analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit I 
to reduce the I/O costs. We consider a varie ... 

Keywords: B-tree, I/O, batched, block, disk, dynamic, extendible hashing, external memory, hie 
memory, multidimensional access methods, multilevel memory, online, out-of-core, secondary st< 



15 Mining block correlations to improve storage performance 

Zhenmin Li, Zhifeng Chen, Yuanyuan Zhou 

May 2005 ACM Transactions on Storage (TOS), volume l issue 2 

Full text available: ||Mftt02MS). Additional Information: MjSitaflfllL abstract, references, jnds&telffiS. 

Block correlations are common semantic patterns in storage systems. They can be exploited for ir 
effectiveness of storage caching, prefetching, data layout, and disk scheduling. Unfortunately, inf< 
block correlations is unavailable at the storage system level. Previous approaches for discovering 
in file systems do not scale well enough for discovering block correlations in storage systems.In tl 
propose two algorithms, C-Miner and ... 

Keywords: Storage management, block correlations, file system management, mining methods , 



Ihe.£jjpper^ 

W. Hollingsworth, H. Sachs, A. J. Smith 

February 1989 Communications of the ACM, Volume 32 Issue 2 

Full text available: 'fjf| prif{4.67 MB) Additional Information: full citation , abstract, references , citings. Index te 

Intergraph's CLIPPER microprocessor is a high performance, three chip module that implements a 
set architecture designed for convenient programmability, broad functionality, and easy future ex 

A fully associative software-managed cache design 

Erik G. Hallnor, Steven K. Reinhardt 

May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th annual ir 

symposium on Computer architecture, volume 28 issue 2 
Full text available: fl^filfUlZJSiSBJ Additional Information: full citation, abstract, .reference^, citings, i.Qdex te 

As DRAM access latencies approach a thousand instruction-execution times and on-chip caches gr 
megabytes, it is not clear that conventional cache structures continue to be appropriate. Two key 
associativity and software management— have been used successfully in the virtual-memory dom 
disk access latencies. Future systems will need to employ similar techniques to deal with DRAM la 
paper presents a practical, fully associati ... 

Full papers: Tree bitmap: hardware/software IP lookups with incremental updates 

Will Eatherton, George Varghese, Zubin Dittia 

April 2004 ACM SIGCOMM Computer Communication Review, volume 34 issue 2 
Full text available: 'g.odttlflaSSLKB) Additional Information: MLcitatjon, abS&afit refemnpes. 

Even with the significant focus on IP address lookup in the published literature as well as focus or 
commercial semiconductor vendors, there is still a challenge for router architects to find solutions 
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simultaneously meet 3 criteria: scaling in terms of lookup speeds as well as table sizes, the ability 
speed updates, and the ability to fit into the overall memory architecture of an Level 3 forwarding 
packet processor with low systems cost overhead. I ... 

19 A general framework for prefetch scheduling in linked data structures and its application to n 
prefetching 

Seungryul Choi, Nicholas Kohout, Sumit Pamnani, Dongkeun Kim, Donald Yeung 
May 2004 ACM Transactions on Computer Systems (TOCS), volume 22 issue 2 

Full text available: * ^ pdf{2,45 MB) Additional Information: full citation , abstract, references, index terras 

Pointer-chasing applications tend to traverse composite data structures consisting of multiple ind* 
chains. While the traversal of any single pointer chain leads to the serialization of memory operat 
traversal of independent pointer chains provides a source of memory parallelism. This article inve 
exploiting such interchain memory parallelism for the purpose of memory latency tolerance, usinc 
called multi-chain prefetching. Previous work ... 

Keywords: Data prefetching, memory parallelism, pointer-chasing code 



20 A general-purpose compression scheme for large coiiections 

July 2002 ACM Transactions on Information Systems (TOIS), volume 20 issue 3 

Full text available: pdff260.29 KB) Additional Information: full citation, abstract, references, index terms , re* 

Compression of large collections can lead to improvements in retrieval times by offsetting the CPl 
costs with the cost of seeking and retrieving data from disk. We propose a semistatic phrase-bas* 
called xray that builds a model offline using sample training data extracted from a collection, and 
compresses the entire collection online in a single pass. The particular benefits of xray are that it 
applications where individual records or documents must b ... 

Keywords: phrase-based compression, random access, sampling 
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